176 research outputs found

    Using Discharge Abstracts to Evaluate a Regional Perinatal Network: Assessment of the Linkage Procedure of Anonymous Data

    Get PDF
    To assess the Burgundy perinatal network (18 obstetrical units; 18 500 births per year), discharge abstracts and additional data were collected for all mothers and newborns. In accordance with French law, data were rendered anonymous before statistical analysis, and were linked to patients using a specific procedure. This procedure allowed data concerning each mother to be linked to those for her newborn(s). This study showed that all mothers and newborns were included in the regional database; the data for all mothers were linked to those for their infant(s) in all cases. Additional data (gestational age) were obtained for 99.9% of newborns

    EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

    Full text link
    [EN] Background: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. Results: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. Conclusions: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface.This work was supported by Universitat Politecnica de Valencia grant PAID-00-17, Generalitat Valenciana grant BEST/2018, and projects H2020-SC1-2016-CNECT No. 727560 and H2020-SC1-BHC-2018-2020 No. 825750Sáez Silvestre, C.; Gutiérrez-Sacristán, A.; Kohane, I.; Garcia-Gomez, JM.; Avillach, P. (2020). EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. GigaScience. 9(8):1-7. https://doi.org/10.1093/gigascience/giaa079S1798Gewin, V. (2016). Data sharing: An open mind on open data. Nature, 529(7584), 117-119. doi:10.1038/nj7584-117aKatzan, I. L., & Rudick, R. A. (2012). Time to Integrate Clinical and Research Informatics. Science Translational Medicine, 4(162). doi:10.1126/scitranslmed.3004583Zhu, L., & Zheng, W. J. (2018). Informatics, Data Science, and Artificial Intelligence. JAMA, 320(11), 1103. doi:10.1001/jama.2018.8211Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine, 380(14), 1347-1358. doi:10.1056/nejmra1814259Andreu-Perez, J., Poon, C. C. Y., Merrifield, R. D., Wong, S. T. C., & Yang, G.-Z. (2015). Big Data for Health. IEEE Journal of Biomedical and Health Informatics, 19(4), 1193-1208. doi:10.1109/jbhi.2015.2450362Sáez, C., Rodrigues, P. P., Gama, J., Robles, M., & García-Gómez, J. M. (2014). Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery, 29(4), 950-975. doi:10.1007/s10618-014-0378-6Schlegel, D. R., & Ficheur, G. (2017). Secondary Use of Patient Data: Review of the Literature Published in 2016. Yearbook of Medical Informatics, 26(01), 68-71. doi:10.15265/iy-2017-032Agniel, D., Kohane, I. S., & Weber, G. M. (2018). Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ, k1479. doi:10.1136/bmj.k1479Sáez, C., & García-Gómez, J. M. (2018). Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. International Journal of Medical Informatics, 119, 109-124. doi:10.1016/j.ijmedinf.2018.09.015Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., … Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 733-739. doi:10.1038/nrg2825Goh, W. W. B., Wang, W., & Wong, L. (2017). Why Batch Effects Matter in Omics Data, and How to Avoid Them. Trends in Biotechnology, 35(6), 498-507. doi:10.1016/j.tibtech.2017.02.012Sáez, C., Zurriaga, O., Pérez-Panadés, J., Melchor, I., Robles, M., & García-Gómez, J. M. (2016). Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. Journal of the American Medical Informatics Association, 23(6), 1085-1095. doi:10.1093/jamia/ocw010Wright, A., Ash, J. S., Aaron, S., Ai, A., Hickman, T.-T. T., Wiesen, J. F., … Sittig, D. F. (2018). Best practices for preventing malfunctions in rule-based clinical decision support alerts and reminders: Results of a Delphi study. International Journal of Medical Informatics, 118, 78-85. doi:10.1016/j.ijmedinf.2018.08.001Moreno-Torres, J. G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N. V., & Herrera, F. (2012). A unifying view on dataset shift in classification. Pattern Recognition, 45(1), 521-530. doi:10.1016/j.patcog.2011.06.019Svolba, G., & Bauer, P. (1999). Statistical Quality Control in Clinical Trials. Controlled Clinical Trials, 20(6), 519-530. doi:10.1016/s0197-2456(99)00029-xBray, F., & Parkin, D. M. (2009). Evaluation of data quality in the cancer registry: Principles and methods. Part I: Comparability, validity and timeliness. European Journal of Cancer, 45(5), 747-755. doi:10.1016/j.ejca.2008.11.032Springate, D. A., Parisi, R., Olier, I., Reeves, D., & Kontopantelis, E. (2017). rEHR: An R package for manipulating and analysing Electronic Health Record data. PLOS ONE, 12(2), e0171784. doi:10.1371/journal.pone.0171784Choi, L., Carroll, R. J., Beck, C., Mosley, J. D., Roden, D. M., Denny, J. C., & Van Driest, S. L. (2018). Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects. Bioinformatics, 34(17), 2988-2996. doi:10.1093/bioinformatics/bty306Gutiérrez-Sacristán, A., Bravo, À., Giannoula, A., Mayer, M. A., Sanz, F., & Furlong, L. I. (2018). comoRbidity: an R package for the systematic analysis of disease comorbidities. Bioinformatics, 34(18), 3228-3230. doi:10.1093/bioinformatics/bty315Denny, J. C., Bastarache, L., Ritchie, M. D., Carroll, R. J., Zink, R., Mosley, J. D., … Roden, D. M. (2013). Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology, 31(12), 1102-1111. doi:10.1038/nbt.2749Khera, R., Dorsey, K. B., & Krumholz, H. M. (2018). Transition to the ICD-10 in the United States. JAMA, 320(2), 133. doi:10.1001/jama.2018.682

    Building Application-Related Patient Identifiers: What Solution for a European Country?

    Get PDF
    We propose a method utilizing a derived social security number with the same reliability as the social security number. We show the anonymity techniques classically based on unidirectional hash functions (such as the secure hash algorithm (SHA-2) function that can guarantee the security, quality, and reliability of information if these techniques are applied to the Social Security Number). Hashing produces a strictly anonymous code that is always the same for a given individual, and thus enables patient data to be linked. Different solutions are developed and proposed in this article. Hashing the social security number will make it possible to link the information in the personal medical file to other national health information sources with the aim of completing or validating the personal medical record or conducting epidemiological and clinical research. This data linkage would meet the anonymous data requirements of the European directive on data protection

    Multi-PheWAS intersection approach to identify sex differences across comorbidities in 59 140 pediatric patients with autism spectrum disorder

    Full text link
    [EN] Objective: To identify differences related to sex and define autism spectrum disorder (ASD) comorbidities female-enriched through a comprehensive multi-PheWAS intersection approach on big, real-world data. Although sex difference is a consistent and recognized feature of ASD, additional clinical correlates could help to identify potential disease subgroups, based on sex and age. Materials and Methods: We performed a systematic comorbidity analysis on 1860 groups of comorbidities exploring all spectrum of known disease, in 59 140 individuals (11 440 females) with ASD from 4 age groups. We explored ASD sex differences in 2 independent real-world datasets, across all potential comorbidities by comparing (1) females with ASD vs males with ASD and (2) females with ASD vs females without ASD. Results: We identified 27 different comorbidities that appeared significantly more frequently in females with ASD. The comorbidities were mostly neurological (eg, epilepsy, odds ratio [OR]>1.8, 3-18 years of age), congenital (eg, chromosomal anomalies, OR>2, 3-18 years of age), and mental disorders (eg, intellectual disability, OR>1.7, 6-18 years of age). Novel comorbidities included endocrine metabolic diseases (eg, failure to thrive, OR=2.5, ages 0-2), digestive disorders (gastroesophageal reflux disease: OR=1.7, 6-11 years of age; and constipation: OR>1.6, 3-11 years of age), and sense organs (strabismus: OR>1.8, 3-18 years of age). Discussion: A multi-PheWAS intersection approach on real-world data as presented in this study uniquely contributes to the growing body of research regarding sex-based comorbidity analysis in ASD population. Conclusions: Our findings provide insights into female-enriched ASD comorbidities that are potentially important in diagnosis, as well as the identification of distinct comorbidity patterns influencing anticipatory treatment or referrals.This work has been supported by the National Institutes of Health BD2K grant U54HG007963. JMZ received grants from Stichting de Drie Lichten and Stichting Sophia Kinderziekenhuis Fonds for a research internship at Harvard Medical School.Gutiérrez-Sacristán, A.; Sáez Silvestre, C.; De Niz, C.; Jalali, N.; Desain, TN.; Kumar, R.; Zachariasse, JM.... (2021). Multi-PheWAS intersection approach to identify sex differences across comorbidities in 59 140 pediatric patients with autism spectrum disorder. Journal of the American Medical Informatics Association. 29(2):230-238. https://doi.org/10.1093/jamia/ocab14423023829

    Combining clinical and genomics queries using i2b2 – Three methods

    Get PDF
    We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine

    Methotrexate and relative risk of dementia amongst patients with rheumatoid arthritis:A multi-national multi-database case-control study

    Get PDF
    Background: Inflammatory processes have been shown to play a role in dementia. To understand this role, we selected two anti-inflammatory drugs (methotrexate and sulfasalazine) to study their association with dementia risk. Methods: A retrospective matched case-control study of patients over 50 with rheumatoid arthritis (486 dementia cases and 641 controls) who were identified from ele

    Development of the Precision Link Biobank at Boston Children’s Hospital: Challenges and Opportunities

    Get PDF
    Increasingly, biobanks are being developed to support organized collections of biological specimens and associated clinical information on broadly consented, diverse patient populations. We describe the implementation of a pediatric biobank, comprised of a fully-informed patient cohort linking specimens to phenotypic data derived from electronic health records (EHR). The Biobank was launched after multiple stakeholders’ input and implemented initially in a pilot phase before hospital-wide expansion in 2016. In-person informed consent is obtained from all participants enrolling in the Biobank and provides permission to: (1) access EHR data for research; (2) collect and use residual specimens produced as by-products of routine care; and (3) share de-identified data and specimens outside of the institution. Participants are recruited throughout the hospital, across diverse clinical settings. We have enrolled 4900 patients to date, and 41% of these have an associated blood sample for DNA processing. Current efforts are focused on aligning the Biobank with other ongoing research efforts at our institution and extending our electronic consenting system to support remote enrollment. A number of pediatric-specific challenges and opportunities is reviewed, including the need to re-consent patients when they reach 18 years of age, the ability to enroll family members accompanying patients and alignment with disease-specific research efforts at our institution and other pediatric centers to increase cohort sizes, particularly for rare diseases
    corecore